student work
Human or AI? Comparing Design Thinking Assessments by Teaching Assistants and Bots
Khan, Sumbul, Liow, Wei Ting, Ang, Lay Kee
ORCID: 0000 -0003-2811-1194 Abstract --As design thinking education is growing in secondary and tertiary education, educators face a mounting challenge of evaluating creative artefacts that comprise visual and textual elements. Traditional, rubric-based methods of assessment are laborious, time-consuming, and inconsistent, due to their reliance on Teaching Assistants (TAs) in large, multi - section cohorts. This paper presents an exploratory study to investigate the reliability and perceived accuracy of AI -assisted assessment vis -à -vis TA-assisted assessment in evaluating student posters in design thinking education. Two activities were conducted with 33 Ministry of Education (MOE), Singapore school teachers, with the objective (1) to compare AI -generated scores with TA grading across three key dimensions: empathy and user understanding, identification of pain points and opportunities, and visual communication, and (2) to understand teacher preferences for AI-assigned, TA-assigned, and hybrid scores. Results showed low statistical agreement between instructor and AI scores for empathy and pain points, though slightly higher alignment for visual communication. Teachers generally preferred TA -assigned scores in six of ten samples. Qualitative feedback highlighted AI's potential for formative feedback, consistency, and student self -reflection, but raised concerns about its limitations in capturing contextual nuance and creative insight. The study underscores the need for hybrid assessment models that integrate computational efficiency with human insights . This research contributes to the evolving conversation around responsible AI adoption in creative disciplines, emphasizing the balance between automation and human judgment for scalable and pedagogically sound assessment practices. Design thinking is a human-centered approach to innovation that draws from the designer's toolkit to integrate the needs of people, the possibilities of technology, and the requirements for business success. It is a non - linear, iterative process that teams use to understand users, challenge assumptions, redefine problems, and create innovative solutions to prototype and test.
- Asia > Singapore (0.27)
- Europe > Netherlands > South Holland > Delft (0.04)
- Education > Educational Setting > Higher Education (1.00)
- Education > Assessment & Standards (1.00)
- Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.68)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.86)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Artificial-Intelligence Grading Assistance for Handwritten Components of a Calculus Exam
Kortemeyer, Gerd, Caspar, Alexander, Horica, Daria
We investigate whether contemporary multimodal LLMs can assist with grading open-ended calculus at scale without eroding validity. In a large first-year exam, students' handwritten work was graded by GPT-5 against the same rubric used by teaching assistants (TAs), with fractional credit permitted; TA rubric decisions served as ground truth. We calibrated a human-in-the-loop filter that combines a partial-credit threshold with an Item Response Theory (2PL) risk measure based on the deviation between the AI score and the model-expected score for each student-item. Unfiltered AI-TA agreement was moderate, adequate for low-stakes feedback but not for high-stakes use. Confidence filtering made the workload-quality trade-off explicit: under stricter settings, AI delivered human-level accuracy, but also left roughly 70% of the items to be graded by humans. Psychometric patterns were constrained by low stakes on the open-ended portion, a small set of rubric checkpoints, and occasional misalignment between designated answer regions and where work appeared. Practical adjustments such as slightly higher weight and protected time, a few rubric-visible substeps, stronger spatial anchoring should raise ceiling performance. Overall, calibrated confidence and conservative routing enable AI to reliably handle a sizable subset of routine cases while reserving expert judgment for ambiguous or pedagogically rich responses.
- North America > United States > Michigan (0.04)
- North America > United States > New York (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (2 more...)
- Instructional Material (0.68)
- Research Report (0.50)
- Education > Assessment & Standards (0.93)
- Education > Curriculum > Subject-Specific Education (0.70)
- Education > Educational Setting (0.66)
A Review of Generative AI in Computer Science Education: Challenges and Opportunities in Accuracy, Authenticity, and Assessment
Reihanian, Iman, Hou, Yunfei, Chen, Yu, Zheng, Yifei
This paper surveys the use of Generative AI tools, such as ChatGPT and Claude, in computer science education, focusing on key aspects of accuracy, authenticity, and assessment. Through a literature review, we highlight both the challenges and opportunities these AI tools present. While Generative AI improves efficiency and supports creative student work, it raises concerns such as AI hallucinations, error propagation, bias, and blurred lines between AI-assisted and student-authored content. Human oversight is crucial for addressing these concerns. Existing literature recommends adopting hybrid assessment models that combine AI with human evaluation, developing bias detection frameworks, and promoting AI literacy for both students and educators. Our findings suggest that the successful integration of AI requires a balanced approach, considering ethical, pedagogical, and technical factors. Future research may explore enhancing AI accuracy, preserving academic integrity, and developing adaptive models that balance creativity with precision.
- Europe > Finland > Southwest Finland > Turku (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Education > Educational Setting (1.00)
- Education > Curriculum > Subject-Specific Education (0.90)
- Education > Assessment & Standards (0.68)
- Education > Educational Technology > Educational Software (0.68)
Pensieve Grader: An AI-Powered, Ready-to-Use Platform for Effortless Handwritten STEM Grading
Yang, Yoonseok, Kim, Minjune, Rondinelli, Marlon, Shao, Keren
Grading handwritten, open-ended responses remains a major bottleneck in large university STEM courses. We introduce Pensieve (https://www.pensieve.co), an AI-assisted grading platform that leverages large language models (LLMs) to transcribe and evaluate student work, providing instructors with rubric-aligned scores, transcriptions, and confidence ratings. Unlike prior tools that focus narrowly on specific tasks like transcription or rubric generation, Pensieve supports the entire grading pipeline-from scanned student submissions to final feedback-within a human-in-the-loop interface. Pensieve has been deployed in real-world courses at over 20 institutions and has graded more than 300,000 student responses. We present system details and empirical results across four core STEM disciplines: Computer Science, Mathematics, Physics, and Chemistry. Our findings show that Pensieve reduces grading time by an average of 65%, while maintaining a 95.4% agreement rate with instructor-assigned grades for high-confidence predictions.
- North America > United States (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course
Yeadon, Will, Peach, Alex, Testrow, Craig P.
This study evaluates the performance of ChatGPT variants, GPT-3.5 and GPT-4, both with and without prompt engineering, against solely student work and a mixed category containing both student and GPT-4 contributions in university-level physics coding assignments using the Python language. Comparing 50 student submissions to 50 AI-generated submissions across different categories, and marked blindly by three independent markers, we amassed $n = 300$ data points. Students averaged 91.9% (SE:0.4), surpassing the highest performing AI submission category, GPT-4 with prompt engineering, which scored 81.1% (SE:0.8) - a statistically significant difference (p = $2.482 \times 10^{-10}$). Prompt engineering significantly improved scores for both GPT-4 (p = $1.661 \times 10^{-4}$) and GPT-3.5 (p = $4.967 \times 10^{-9}$). Additionally, the blinded markers were tasked with guessing the authorship of the submissions on a four-point Likert scale from `Definitely AI' to `Definitely Human'. They accurately identified the authorship, with 92.1% of the work categorized as 'Definitely Human' being human-authored. Simplifying this to a binary `AI' or `Human' categorization resulted in an average accuracy rate of 85.3%. These findings suggest that while AI-generated work closely approaches the quality of university students' work, it often remains detectable by human evaluators.
- Research Report > New Finding (1.00)
- Instructional Material > Course Syllabus & Notes (0.84)
Leveraging Human Feedback to Scale Educational Datasets: Combining Crowdworkers and Comparative Judgement
Machine Learning models have many potentially beneficial applications in education settings, but a key barrier to their development is securing enough data to train these models. Labelling educational data has traditionally relied on highly skilled raters using complex, multi-class rubrics, making the process expensive and difficult to scale. An alternative, more scalable approach could be to use non-expert crowdworkers to evaluate student work, however, maintaining sufficiently high levels of accuracy and inter-rater reliability when using non-expert workers is challenging. This paper reports on two experiments investigating using non-expert crowdworkers and comparative judgement to evaluate complex student data. Crowdworkers were hired to evaluate student responses to open-ended reading comprehension questions. Crowdworkers were randomly assigned to one of two conditions: the control, where they were asked to decide whether answers were correct or incorrect (i.e., a categorical judgement), or the treatment, where they were shown the same question and answers, but were instead asked to decide which of two candidate answers was more correct (i.e., a comparative/preference-based judgement). We found that using comparative judgement substantially improved inter-rater reliability on both tasks. These results are in-line with well-established literature on the benefits of comparative judgement in the field of educational assessment, as well as with recent trends in artificial intelligence research, where comparative judgement is becoming the preferred method for providing human feedback on model outputs when working with non-expert crowdworkers. However, to our knowledge, these results are novel and important in demonstrating the beneficial effects of using the combination of comparative judgement and crowdworkers to evaluate educational data.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Armenia (0.04)
- (2 more...)
- Research Report > New Finding (0.88)
- Research Report > Experimental Study (0.67)
- Research Report > Strength High (0.54)
How ChatGPT Can Help with Grading • TechNotes Blog
I enjoy teaching, but I don't enjoy grading. Using rubrics makes grading easier, but it can still be a chore to review each assignment with fresh eyes so that the student you are grading now gets the same attention as the first few. This is where ChatGPT can come in handy. It doesn't get tired of grading. And if you have a tight rubric (well-designed with little or no loopholes), you can expect consistent results from ChatGPT, but there are a few important things to consider.
AI arrives on college campuses: How students are using ChatGPT for essays, research and more
Ready or not, the AI revolution is upon us and one of its most immediate impacts is the emergence of chatbots like ChatGPT. "It will be a boon to the societies that pick this up," said junior student leader and president of the Metropolitan State University of Denver Chess Club Paul Nelson. Nelson is talking about ChatGPT and its rapid emergence on college campuses throughout the U.S. One educated at MSU Denver said the first time he heard of the chatbot was in November and now, four months later, it's a part of almost every conversation he has. "My first reaction when I first saw ChatGPT was, 'Oh my God. We are in trouble,'" said Dr. David Merriam, assistant professor of biology.
- Education (0.73)
- Leisure & Entertainment > Games > Chess (0.56)
UAB cybersecurity program ranked No. 1 - Yellowhammer News
Fortune ranked the University of Alabama at Birmingham's in-person master's degree in cybersecurity as the No. 1 program in the country. According to Fortune, there are nearly 770,000 cybersecurity job openings in the United States. "We are proud to be recognized for academic excellence by Fortune and named the nation's leading institution for graduate studies in cybersecurity," said UAB Provost and Senior Vice President for Academic Affairs Pam Benoit. "UAB's Department of Computer Science has created an outstanding collaborative master's degree program that prepares students to lead careers solving the world's most challenging cybersecurity problems." Fortune's first-ever ranking of in-person cybersecurity master's degree programs compared 14 programs across the United States in three components: Selectivity Score, Success Score and Demand Score.
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military > Cyberwarfare (1.00)
- Education > Educational Setting > Higher Education (1.00)
Various Roles of AI (Artificial Intelligence) in Education
The role of AI in education is to provide personalized learning experiences for students and to assist educators in the classroom. AI can provide students with individualized feedback and recommendations based on their learning progress. AI can also help educators to identify areas where students may need extra support. Thus, in this blog post, I shall highlight the roles, AI can play in teaching, learning, and assessment. AI for Teaching Let us see, what role AI can play in teaching to improve the learning outcome. The role of AI in teaching is to provide educators with tools and resources that...
- Education > Educational Technology > Educational Software > Computer Based Training (1.00)
- Education > Educational Setting (0.95)